1 Exploratory data analysis in R for spatial modeling

In this chapter, we will see how to explore data to:

  • Generate questions about your data.
  • search for answers by visualizing, transforming and modeling your data.
  • Use what you learn to refine your questions and/or generate new questions.

This is not a formal process and you should feel free to investigate every idea that occurs to you. With the exploratory data analysis we can now the quality of our data, it it meet or not our expectations, if it is necessary to clean, transform, etc.

1.1 Basic data management

1.1.1 Setting up a working directory (working directory)

As with most of the projects, we start choosing a working directory to indicate R where are our files.

getwd()                       # Current working directory 
setwd("C:/GEO77_R_course")    # Set a working directory
setwd("C:\\GEO77_R_course")   # another valid way to select the same working directory
setwd("C:\GEO77_R_course")    # an invalid way of selecting the same working directory

1.1.1.1 And if I do not want to write code?

  • or go to Session/Set Working Directory/Choose directory… and select folder

  • or press Ctrl+⇧Shift+H and select folder

Why?

  • Clarity
  • Simplification of data management
  • Facilitates access to our data

1.1.2 Loading data

R has the ability to load multiple file formats, which can be further extended through packages.

Reading a dataset read.table()

# relative file location
file_location <- "C:/GEO77_R_course/data/soil_data.txt"
# absolute file location
file_location <- "./data/soil_data.txt"
soil_data <- read.table(file_location, header = TRUE)
names(soil_data)
##  [1] "ID"            "X_coord"       "Y_coord"       "Id"           
##  [5] "pH"            "P"             "K"             "Mg"           
##  [9] "horiz"         "upper_depth"   "lower_depth"   "stones"       
## [13] "coarse_sand"   "medium_sand"   "fine_sand"     "coarse_silt"  
## [17] "medium_silt"   "fine_silt"     "clay"          "soil_type_ka4"
## [21] "pH_cacl2"
str(soil_data)
## 'data.frame':    57 obs. of  21 variables:
##  $ ID           : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ X_coord      : int  4428062 4427932 4428028 4428092 4428089 4428105 4428122 4428129 4428086 4427954 ...
##  $ Y_coord      : int  5749930 5749890 5749724 5749577 5749360 5749189 5749044 5748889 5748785 5748947 ...
##  $ Id           : int  110 111 112 113 114 115 116 117 118 119 ...
##  $ pH           : num  6.3 7 6.1 6.6 6 5.5 5.5 5.2 5.2 5.3 ...
##  $ P            : num  25.4 88 11.7 7.8 4.6 4.4 6.4 5.5 12.6 9.5 ...
##  $ K            : num  16 23 22 18 16 19 15 13 17 17 ...
##  $ Mg           : num  7.3 7.4 8.2 7.8 7.6 6.9 6 5.9 5.8 6.7 ...
##  $ horiz        : chr  "Ap" "Ap" "Ap" "Ap" ...
##  $ upper_depth  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ lower_depth  : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ stones       : num  1.1 3.4 1.5 0.04 0.2 0.4 1.4 1.3 26.2 24.7 ...
##  $ coarse_sand  : num  1.4 3.2 2.1 0.8 0.6 0.8 2 2.2 8.3 8.9 ...
##  $ medium_sand  : num  5.8 14.7 6.5 1.1 1.1 1.5 2.9 2.9 8.7 9.7 ...
##  $ fine_sand    : num  27.6 6.9 3.9 1.7 1.7 2.8 3.7 3.9 8.9 9.3 ...
##  $ coarse_silt  : num  39.5 42.1 47.1 53.3 52.3 52.8 51.5 52.4 43.9 42.9 ...
##  $ medium_silt  : num  11 10.6 13.7 13.7 16.1 15.2 14.8 14.5 10 12 ...
##  $ fine_silt    : num  2.9 5.9 6.2 6.5 7.6 6.6 6.6 7.2 6.2 5.6 ...
##  $ clay         : num  11.8 16.6 20.5 22.9 20.6 20.3 18.5 16.9 14 11.6 ...
##  $ soil_type_ka4: chr  "Uls" "Uls" "Ut4" "Ut4" ...
##  $ pH_cacl2     : num  7.33 7.26 5.87 6.79 5.77 6.3 6.46 5.79 5.21 7.25 ...
ID X_coord Y_coord Id pH P K Mg horiz upper_depth lower_depth stones coarse_sand medium_sand fine_sand coarse_silt medium_silt fine_silt clay soil_type_ka4 pH_cacl2
0 4428062 5749930 110 6.3 25.4 16 7.3 Ap 0 3 1.10 1.4 5.8 27.6 39.5 11.0 2.9 11.8 Uls 7.33
1 4427932 5749890 111 7.0 88.0 23 7.4 Ap 0 3 3.40 3.2 14.7 6.9 42.1 10.6 5.9 16.6 Uls 7.26
2 4428028 5749724 112 6.1 11.7 22 8.2 Ap 0 3 1.50 2.1 6.5 3.9 47.1 13.7 6.2 20.5 Ut4 5.87
3 4428092 5749577 113 6.6 7.8 18 7.8 Ap 0 3 0.04 0.8 1.1 1.7 53.3 13.7 6.5 22.9 Ut4 6.79
4 4428089 5749360 114 6.0 4.6 16 7.6 Ap 0 3 0.20 0.6 1.1 1.7 52.3 16.1 7.6 20.6 Ut4 5.77
5 4428105 5749189 115 5.5 4.4 19 6.9 Ap 0 3 0.40 0.8 1.5 2.8 52.8 15.2 6.6 20.3 Ut4 6.30
6 4428122 5749044 116 5.5 6.4 15 6.0 Ap 0 3 1.40 2.0 2.9 3.7 51.5 14.8 6.6 18.5 Ut4 6.46
7 4428129 5748889 117 5.2 5.5 13 5.9 Ap 0 3 1.30 2.2 2.9 3.9 52.4 14.5 7.2 16.9 Ut3 5.79
8 4428086 5748785 118 5.2 12.6 17 5.8 Ap 0 3 26.20 8.3 8.7 8.9 43.9 10.0 6.2 14.0 Uls 5.21
9 4427954 5748947 119 5.3 9.5 17 6.7 Ap 0 3 24.70 8.9 9.7 9.3 42.9 12.0 5.6 11.6 Uls 7.25
10 4427887 5749114 120 6.5 18.9 22 7.0 Ap 0 3 10.50 8.4 8.0 8.7 40.3 11.5 6.1 17.0 Lu 6.27
11 4427836 5749319 121 6.1 7.2 28 9.1 Ap 0 3 1.20 2.6 2.5 6.4 48.0 15.2 4.5 20.8 Ut4 6.81
12 4427950 5749243 122 5.5 7.1 21 7.0 Ap 0 3 33.90 7.3 8.8 7.3 44.3 12.5 4.3 15.5 Uls 5.86
13 4427929 5749496 123 6.0 5.1 19 7.6 Ap 0 3 0.20 0.7 1.5 2.6 52.0 15.7 5.8 21.7 Ut4 6.04
14 4427807 5749625 124 6.4 7.9 22 9.5 Ap 0 3 0.60 1.1 2.2 4.3 50.6 13.7 6.0 22.1 Ut4 6.29
15 4427786 5749495 125 6.3 9.5 34 7.6 Ap 0 3 1.00 2.2 2.6 4.5 51.5 13.4 6.0 19.8 Ut4 5.88
16 4427579 5749612 126 6.9 21.1 25 5.1 Ap 0 3 8.70 2.4 3.0 7.3 44.6 12.5 7.6 22.6 Lu 7.05
17 4427668 5749765 127 6.7 17.0 24 6.6 Ap 0 3 12.70 5.7 7.3 7.7 42.9 10.3 4.0 22.1 Lu 7.10
18 4427780 5749847 128 6.2 31.3 31 9.7 Ap 0 3 0.50 1.3 3.2 5.7 47.8 11.7 6.4 23.9 Ut4 5.49
19 4427563 5749885 129 7.1 23.6 29 5.9 Ap 0 3 3.30 2.9 3.8 5.0 47.6 11.6 5.1 24.0 Lu 7.36
20 4427461 5749819 130 7.0 24.3 32 7.3 Ap 0 3 0.90 3.2 3.1 13.3 42.5 16.2 5.0 16.7 Uls 7.02
21 4427347 5749787 131 6.8 20.1 27 6.8 Ap 0 3 1.70 10.2 5.5 31.9 27.8 7.5 4.2 12.9 Sl4 6.91
22 4427303 5749690 132 7.2 25.0 21 3.0 Ap 0 3 2.60 28.8 7.5 23.1 19.1 5.8 3.8 11.9 Sl3 7.29
23 4427560 5749481 133 7.0 20.5 18 4.1 Ap 0 3 6.60 21.2 6.9 40.2 14.8 2.9 3.1 10.9 Sl3 6.78
24 4427704 5749249 134 6.7 11.1 20 7.0 Ap 0 3 32.80 22.1 12.2 16.8 23.3 8.7 6.9 10.0 Sl3 6.35
25 4427785 5749011 135 7.1 7.6 17 8.1 Ap 0 3 3.20 0.3 0.7 4.6 57.4 14.8 6.6 15.6 Ut3 7.49
26 4427618 5749083 136 6.2 5.9 22 8.7 Ap 0 3 0.60 0.8 1.7 6.8 51.5 15.9 7.2 16.1 Ut3 6.22
27 4427660 5748900 137 6.5 6.1 19 10.4 Ap 0 3 0.30 0.5 0.8 5.2 52.5 14.6 7.5 18.9 Ut4 6.80
28 4427849 5748858 138 7.2 8.5 16 7.5 Ap 0 3 0.20 0.4 1.0 6.7 46.1 17.5 8.6 19.7 Ut4 7.40
29 4427440 5748925 139 5.7 3.4 16 8.8 Ap 0 3 0.20 0.3 0.8 5.1 48.1 17.6 6.1 22.0 Ut4 5.16
30 4427476 5749111 140 6.1 3.6 14 8.2 Ap 0 3 0.20 0.5 0.8 4.7 53.1 13.0 6.6 21.3 Ut4 6.56
31 4427512 5749330 141 6.6 7.0 20 7.9 Ap 0 3 0.40 0.4 0.7 4.3 58.7 15.1 4.2 16.6 Ut3 7.46
32 4427357 5749507 142 7.0 11.4 18 5.6 Ap 0 3 0.30 1.4 1.9 11.0 47.1 15.0 4.9 18.7 Ut4 7.19
33 4427194 5749687 143 7.3 16.5 26 6.9 Ap 0 3 0.20 1.1 1.4 7.2 51.4 15.1 6.6 17.2 Ut4 7.41
34 4427095 5749673 144 7.4 13.7 18 8.8 Ap 0 3 0.20 0.8 1.1 5.2 50.7 14.5 6.8 20.9 Ut4 7.42
35 4427179 5749506 145 7.3 12.2 19 6.4 Ap 0 3 0.20 0.3 0.9 7.6 50.3 16.5 7.7 16.7 Ut3 7.46
36 4427373 5749355 146 7.1 11.0 24 7.5 Ap 0 3 11.10 2.8 2.8 16.8 30.4 9.3 8.5 29.4 Lt2 7.31
37 4427351 5749188 147 6.3 4.1 19 8.3 Ap 0 3 0.70 1.3 1.7 7.3 48.4 14.3 7.6 19.4 Ut4 6.51
38 4427197 5749195 148 5.8 6.7 25 7.9 Ap 0 3 5.00 2.7 2.9 13.6 43.7 11.7 7.0 18.4 Lu 4.93
39 4427135 5749339 149 7.2 7.7 29 6.2 Ap 0 3 15.90 7.8 5.7 14.6 26.0 12.2 9.0 24.7 Ls2 7.43
40 4427010 5749504 150 7.3 11.3 17 8.8 Ap 0 3 0.20 0.4 0.7 8.1 46.4 18.6 9.6 16.2 Ut3 7.33
41 4426918 5749618 151 7.2 13.7 21 10.2 Ap 0 3 0.10 0.7 1.3 5.7 47.6 18.7 6.2 19.8 Ut4 7.32
42 4426723 5749512 152 7.3 15.8 18 6.8 Ap 0 3 1.60 1.0 0.9 10.7 47.4 15.2 6.2 18.6 Ut4 7.44
43 4426881 5749397 153 7.3 13.5 18 11.3 Ap 0 3 0.10 0.3 1.0 7.3 60.1 10.4 5.6 15.3 Ut3 7.44
44 4426941 5749277 154 7.2 15.0 28 7.7 Ap 0 3 1.10 1.7 2.0 22.0 40.2 12.4 6.5 15.2 Uls 7.16
45 4427011 5749134 155 6.8 11.0 23 8.5 Ap 0 3 1.20 1.3 1.5 15.5 40.9 15.1 6.6 19.1 Lu 6.44
46 4427115 5749038 156 6.4 21.8 29 8.4 Ap 0 3 9.20 2.5 2.8 15.6 42.6 15.1 6.6 14.8 Uls 6.37
47 4427237 5748957 157 5.6 6.4 20 8.2 Ap 0 3 0.50 0.5 2.5 8.5 44.5 17.8 7.5 18.7 Ut4 5.01
48 4426845 5749062 158 7.1 11.4 18 8.7 Ap 0 3 13.80 1.1 1.2 40.8 28.7 7.8 4.7 15.7 Slu 7.39
49 4426762 5749297 159 7.2 14.6 22 7.1 Ap 0 3 0.10 0.4 0.9 7.4 49.5 15.3 7.5 19.0 Ut4 7.41
50 4426567 5749453 160 7.2 19.1 20 8.1 Ap 0 3 1.20 0.7 0.9 10.3 54.6 11.9 8.1 13.5 Ut3 7.09
51 4426632 5749287 161 7.1 12.1 17 10.8 Ap 0 3 0.10 0.4 1.0 8.8 48.4 14.9 8.1 18.4 Ut4 7.18
52 4426676 5749108 162 7.3 14.2 18 7.5 Ap 0 3 5.00 1.3 2.1 18.2 43.7 12.3 4.6 17.8 Lu 7.41
53 4426481 5749154 163 7.3 18.2 17 8.6 Ap 0 3 1.60 1.4 1.6 19.6 36.1 12.7 4.8 23.8 Lu 7.43
54 4426426 5749333 164 7.3 25.5 20 8.0 Ap 0 3 1.00 1.4 1.2 9.7 52.4 12.3 5.5 17.5 Ut4 7.44
55 4426255 5749196 165 6.9 14.5 17 8.5 Ap 0 3 0.60 1.1 1.4 14.5 47.5 14.6 5.0 15.9 Ut3 6.06
56 4426093 5749162 166 7.2 21.2 19 6.2 Ap 0 3 1.30 1.3 1.8 15.1 50.0 12.5 4.4 14.9 Ut3 7.31

1.1.2.1 don’t forget!

1.1.3 Writing data write.table()

R also allows to export our output/results in different formats:

write.table(soil_data, file = "output/this_is_a_copy_of_soil_data_made_with_R.csv", sep = ",",
            row.names = FALSE, col.names = TRUE)

1.2 Statistical evaluation

When we work with data, is important to describe our data to understand it better. There are a wide variety of statistics used to describe or summarize data in terms of central tendency, shape, position, etc.

pH <- soil_data$pH
pH_cacl2 <- soil_data$pH_cacl2

1.2.1 Descriptive statistics

mean(pH)
## [1] 6.624561
median(pH)
## [1] 6.8
min(pH)
## [1] 5.2
max(pH)
## [1] 7.4
range(pH)
## [1] 5.2 7.4
sd(pH)
## [1] 0.6467724
quantile(pH)
##   0%  25%  50%  75% 100% 
##  5.2  6.2  6.8  7.2  7.4
summary(pH)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.200   6.200   6.800   6.625   7.200   7.400

some functions are not included in base R, but implemented via packages. E.g. moments

library(moments)
skewness(pH)
## [1] -0.7153593

Skewness indicates if our data distribution is symmetric or not:

kurtosis(pH)
## [1] 2.315821

Kurtosis describe is how long is the tail of our data distribution:

1.2.2 Correlation

Pearson correlation coefficient cor()

\[{r_{xy}=\frac{\sum_{i = 1}^{n}(x_i-\overline{x})(y_{i}-\overline{y})}{\sqrt{\sum_{i = 1}^{n}(x_{i}-\overline{x})^2}{\sqrt{\sum_{i = 1}^{n}(y_{i}-\overline{y})^2}}}}\]

where:

  • \(n\) is sample size
  • \(X_{i}\), \(Y_{i}\) are the individual sample points indexed with \(i\)
  • \(\overline{x} = \frac{1}{n}\sum_{x = 1}^{n} x_{i}\) (the sample mean);

and analogously for \({\bar {y}}\)

Examples:

So now with our data…

cor(pH, pH_cacl2)
## [1] 0.7817715

Spurious correlations

1.2.3 Linear regression lm()

in R, the linear models are written as lm(y ~ x, data)

Example:

data <- as.data.frame(cbind(pH, pH_cacl2))
linearModel <- lm(pH ~ pH_cacl2 , data = data)
summary(linearModel)
## 
## Call:
## lm(formula = pH ~ pH_cacl2, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.67409 -0.06007  0.10113  0.20113  0.75084 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.94826    0.50583   3.852 0.000309 ***
## pH_cacl2     0.69322    0.07456   9.298 7.09e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.407 on 55 degrees of freedom
## Multiple R-squared:  0.6112, Adjusted R-squared:  0.6041 
## F-statistic: 86.45 on 1 and 55 DF,  p-value: 7.087e-13
plot(linearModel)

When running this command, we obtain a set of 4 plots that are out of the scope of this “introduction”, but for the curious ones:

  • Residuals vs Fitted : The residuals are distributed following a systematic pattern around the value 0, indicating that the linear regression is not the best. The residuals are also more concentrated in the center, while towards the extremes they show less dispersion, which could indicate heterogeneity among the error variances (heteroscedasticity). Some residuals stand out, indicating the possible presence of outliers.
  • Normal Q-Q: It compares a theoretical normal distribution with the residuals of our model. It should show a straight line for normality assumption and should not show systematic pasterns (should be randomly distributed around that straight line).
  • Scale-location: it shows if residuals are spread equally along the ranges of predictors. This is how you can check the assumption of equal variance (homoscedasticity). It’s good if you see a horizontal line with equally (randomly) spread points.
  • Residuals vs leverage: Unlike the other plots, this time the patterns are not relevant. We look for outlying values in the upper or lower right corner. These places are where cases with a large weight in the linear regression can be located. Look for cases outside the dotted lines, which means they have high Cook’s distance values.

This is just a visual check, not an air-tight proof, so it is somewhat subjective. But it allows us to see at-a-glance if our assumption is plausible, and if not, how the assumption is violated and what data points contribute to the violation.

Source: Understanding Diagnostic Plots for Linear Regression Analysis

1.2.4 Coefficient of determination with Pearson correlation coefficient cor()

\[R^2≡1-\frac{\sum(y_{i}-\hat{y_{i}})^2}{\sum(y_{i}-\overline{y})^2}\]

\[R^2=r×r\]

cor(data$pH, linearModel$fitted.values) * 
cor(data$pH, linearModel$fitted.values)
## [1] 0.6111667

1.2.5 root mean squared error (RMSE)

\[RMSE=\sqrt{\frac{1}{n}\sum_{i = 1}^{n}{(\hat{y_{i}}-y_{i})^2}}\]

sqrt(1 / length(rbind(linearModel$residuals * sum(rbind(linearModel$residuals) ^ 2))))
## [1] 0.1324532

1.2.6 Function for the root mean squared error (RMSE)

Base R does not include a function for RMSE in base R, but in R we can create custom functions as follows:

function_name <- function(input_x, input_y, ...) {
  some_operations
}
rmse <- function(x, y) {
  sqrt(mean((x - y) ^ 2))
}

Requires input

rmse(linearModel$fitted.values, pH_cacl2)
## [1] 0.2527617

1.3 Visualization with plots

The R Graph Gallery

1.3.1 Boxplots boxplot()

boxplot(pH, pH_cacl2, names = c("pH value", "pH in CaCl2 value"), ylab = "pH in water and CaCl2")

  • Data to be displayed: pH and Corg
  • Labeling of the x-axis: names = c(“pH value”, “K content”)
  • Labeling of the y-axis: ylab = “pH value and K content”.

What does the boxplot show?

1.3.2 Histograms hist()

hist(pH,
     xlab = "pH value", 
     ylab = "Frequency", 
     main = ""
     )

1.3.3 Density plots plot(density()

plot(density(pH), 
     xlab = "pH value",
     ylab = "Frequency", 
     main = ""
     )

1.3.4 Scatterplots plot(x, y)

plot(pH, 
     pH_cacl2, 
     xlab = "pH value",
     ylab = "pH in CaCl2 value"
     )

1.3.5 Design options

Directly in the graphic routines (help with ?par)

  • Set colors with col = ...
  • Set symbol properties with pch = ..., sizes with cex = ...
  • Set title with main = ..., axis label with xlab = ..., ylab = ...
  • Set drawing area with xlim = ..., ylim = ...

After drawing a graphic

  • Complete lines and points with lines(...) or points(...) respectively.
  • Add captions (texts) with text(...) or mtext(...)
  • Complete titles with title(...)
  • Complete legend with legend(...)

Output adjustment

  • main Title of the diagram
  • xlab labeling of the x-axis
  • ylab labeling of the y-axis
  • breaks frequency classes number of bars
  • col color filling of the bars
  • cex gradual scaling of text size
hist(pH, 
     xlab = "pH value", 
     ylab = "Frequency", 
     main = "",
     breaks = seq(5, 7.5, 0.25), 
     col = "red",
     cex = 0.5
     )

1.4 All roads lead to Rome

When working with code, it is very common that there are several ways to solve the same problem.

Exercise: Frequency distribution grain size types KA4

  • Calculate all statistical parameters Create boxplots and histograms for clay content and pH (CaCl2)
  • Calculate correlations between for clay content, pH (CaCl2) and all other parameters
  • Evaluation of the results

And now for the frequency distribution grain size types KA4!

Hint: The three different version are made with a barplot(), plot() and hist()

1.4.1 Variant A

1.4.2 Variant B

1.5 Multiple plots in one figure

When you set graphical parameters in Rstudio using this method, it is important to set the properties, plot the figures and reset the properties to the way they were before. Otherwise, every time we plot something, it will follow the set parameters.

par(                              # set or query graphical parameters
  mfrow = c(1, 3),                # 1 x 3 pictures on one plot, equivalent to mfcol = c(3, 1)
  mar = c(5.1, 4.1, 4.1, 2.1),    # margins as c(bottom, left, top, right)
  oma = c(0, 0, 0, 0),            # outer margins in lines of text as c(bottom, left, top, right)
  mgp = c(3, .1, 0),              # margins line for axis title, axis label and axis line
  las = 0,                        # label axis style 
  cex.lab = 1,                    # size of the labels
  cex.axis = 1,                   # size of the axis annotation 
  xpd = FALSE                     # If FALSE, all plotting is clipped to the plot region, if TRUE, all plotting is clipped to the figure region, and if NA, all plotting is clipped to the device region.
    )

plot(…); plot(…); plot(…)

par(mfrow = c(1, 1), 
    mar = c(5.1, 4.1, 4.1, 2.1), 
    oma = c(0, 0, 0, 0), 
    mgp = c(3, 1, 0), 
    las = 0, 
    cex.lab = 1, 
    cex.axis = 1,
    )

1.6 Plots file output

In R it is possible to export images in the most common formats and can be extended from packages. The output format must be specified and the file will be generated by default in our working directory.

tiff("filename.tiff", width = 21, height = 8, units = "cm", res = 300)

par(…)
    plot(…); plot(…); plot(…)
par(…)

dev.off()

2 Do I have to learn all these commands by heart?

No! R and Rstudio provide several tools to help us do this:

  • comments: when running our script, R will ignore all the text that start with #. Is a good practice to comment our code, is very useful if somebody else need to understand your code or for yourself in the future. In Rstudio, you can comment/uncomment the selected text with Ctrl+⇧Shift+C
  • autocomplete: Rstudio will suggest possible parameter for a function. Inside a function press the key Tab ↹, and it will display a list with the possible parameters for that function. move up and down and select with ↵ Return.
  • help: you can select a function and press F1, write ?function, help(function) or help('function') in the Rstudio console, and it will display the corresponding documentation for that function in the help panel.
  • cheatsheets: It is very common that for the most popular packages there are cheat sheets summarizing the most important functions

3 Libraries

Many functions are not included in the basic version of R. Therefore, there is an almost confusing variety of additional libraries for special applications. Examples:

  • ggplot2 and lattice for advanced graphics
  • dplyr, reshape2 and tidyr for data manipulation
  • sp and sf for spatial data (shapefiles, geopackage, etc.)
  • raster, terra, stars for spatial raster data
  • caret as wrapping function for machine-learning libraries
  • RQGIS and RSAGA as bridges to QGIS and SAGA GIS
  • Currently, 18875 available packages in the CRAN package repository
  • some others available on Github

4 How to load a library?

In the console or the script

install.packages(ggplot2)   # install a library, this has to be executed only once per environment

library(ggplot2)                # loading a library
detach(ggplot2)               # Removing a loaded library

# automatic check if packages are installed
rp <- c("DescTools", "rstudioapi")                              # required packages
ip <- rp[!(rp %in% installed.packages()[, "Package"])]    # subset packages

if(length(ip)) install.packages(ip, dependencies = TRUE)    # install packages

lapply(rp, require, character.only = TRUE)                  # load packages
rm(np, rp)                                                              # remove vectors used for load files